From Text to Pathway: Corpus Annotation for Knowledge Acquisition from Biomedical Literature
نویسندگان
چکیده
We present a new direction of research, which deploys Text Mining technologies to construct and maintain data bases organized in the form of pathway, by associating parts of papers with relevant portions of a pathway and vice versa. In order to materialize this scenario, we present two annotated corpora. The first, Event Annotation, identifies the spans of text in which biological events are reported, while the other, Pathway Annotation, associates portions of papers with specific parts in a pathway.
منابع مشابه
Building a Bio-Event Annotated Corpus for the Acquisition of Semantic Frames from Biomedical Corpora
This paper reports on the design and construction of a bio-event annotated corpus which was developed with a specific view to the acquisition of semantic frames from biomedical corpora. We describe the adopted annotation scheme and the annotation process, which is supported by a dedicated annotation tool. The annotated corpus contains 677 abstracts of biomedical research articles.
متن کاملDistributional Framework for Emergent Knowledge Acquisition and its Application to Automated Document Annotation
The paper introduces a framework for representation and acquisition of knowledge emerging from large samples of textual data. We utilise a tensor-based, dis-tributional representation of simple statements extracted from text, and show how one can use the representation to infer emergent knowledge patterns from the tex-tual data in an unsupervised manner. Examples of the patterns we investigate ...
متن کاملCollaborative text-annotation resource for disease-centered relation extraction from biomedical text
Agglomerating results from studies of individual biological components has shown the potential to produce biomedical discovery and the promise of therapeutic development. Such knowledge integration could be tremendously facilitated by automated text mining for relation extraction in the biomedical literature. Relation extraction systems cannot be developed without substantial datasets annotated...
متن کاملA Corpus of Tables in Full-Text Biomedical Research Publications
The development of text mining techniques for biomedical research literature has received increased attention in recent times. However, most of these techniques focus on prose, while much important biomedical data reside in tables. In this paper, we present a corpus created to serve as a gold standard for the development and evaluation of techniques for the automatic extraction of information f...
متن کاملBIOTEX: A system for Biomedical Terminology Extraction, Ranking, and Validation
Term extraction is an essential task in domain knowledge acquisition. Although hundreds of terminologies and ontologies exist in the biomedical domain, the language evolves faster than our ability to formalize and catalog it. We may be interested in the terms and words explicitly used in our corpus in order to index or mine this corpus or just to enrich currently available terminologies and ont...
متن کامل